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ABSTRACT 

Since the middle of the 1940's scientists have used Monte Carlo (MC) simulations to obtain 
information about physical processes. This has proved a accurate and and reliable method to 
obtain this information. Through out resent years researchers has begone to use the slightly 
newer Markov Chain Monte Carlo (MCMC) simulation. This differs from the ordinary MC 
by using the Markov Chain. MCMC originates from Bayesian statistics. This method has 
given researchers a completely new tool to learn something about physical systems. One of 
the fields where MCMC is a good new tool, is astrophysics. Today MCMC is widely used in 
simulating power spectra for asteroseismic data. Hereby providing the scientists with impor- 
tant new information of stellar interiors. From our results we see that MCMC delivers a robust 
and reliable result with good error estimation. We also learn that MCMC is a power full tool 
which can be applied to a large verity of problems. 
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1 INTRODUCTION 

In this paper we want to investigate the properties of statistical sim- 
ulation. We will worlc on a set of data that we Imow can be de- 
scribed by a straight line. Since we know how we can describe the 
data, we also know that we can test various other methods of getting 
about the fitting of a straight line. The other methods we will com- 
pare it with is the least squares fit both weighted an non-weighted. 
The statistical algorithm we will apply to try and determine the pa- 
rameters in the straight line fit is called Markov-Chain Monte-Carlo 
(henceforth MCMC). Which has become increasingly popular dur- 
ing resent years. 

Throughout this paper we will go through various things in sec- 
tion 2 we will treat random numbers and problems with different 
random number generators. We will also address the problem of 
transforming a uniform random sample into a normal distributed 
random sample. In section 3 we will present the data set that are 
used as the test case for the MCMC. In section 4 we go into de- 
tails with least squares fitting both with and with out weighting of 
the data points we will also look at the results when fitting straight 
lines to our data. Section 5 contains an introduction to Bayesian 
statistics, and presents the fundamental Bayes' theorem. In Section 
6 we go into detail with different types of samplers, we look at two 
who build on the same principle, these are the Gibbs sampler and 
the Metropolis-Hastings Algorithm. We also show the result of our 
Markov-Chain sampling, which give a fairly good representation 
of our parameter space. Section 7 presents the underlying theory 
of the Markov-Chain and the Monte-Carlo and finally combine the 
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two into the Markov-Chain Monte-Carlo simulation. Section 8 dis- 
cuss the results of our MCMC and presents the estimates of m and b 
for the straight line given in equation (7). Section 9 present some of 
the perspectives of the MCMC simulations in astrophysics. Since 
this is becoming more and more used in this field of research. 



2 RANDOM NUMBER GENERATION AND 
TRANSFORMATION 

In Press et al. (2007) a few rules about the random sampling is men- 
tioned that is worth mentioning here as well. Some of the aspects 
of random number generators one should be aware of are: 

• Never use a generator based on a linear congruential gen- 
erator (LCG) or a multiplicative linear congruential generator 
(MLCG). 

• Never use a generator with a period less than ~ 2*'' a; 2 x 10", 
or any generator whose period is undisclosed. 

• Never use a generator that warns against using its low-order 
bits as being completely random. That was once a good advice, but 
now it seams to indicate an obsolete algorithm (normally a LCG). 

• Never ever use the build in generator of the languages C and 
C++. Especially rand and srand, they have no standard implemen- 
tation and are often badly flawed. 

Another thing we should be aware of when dealing with random 
numbers are the over engineered generators which seams to waste 
resources. 

• Avoid generator that take more than ~ two dozen arithmetic or 
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Figure 1. 10 000 random generated points from the build in rand generator. 



Figure 2. Again 10 000 random sampled numbers using a generator de- 
scribed in (Press et al. 2007, p. 342) 



logical operations to generate a 64-bit integer or double precision 
floating result. 

• Likewise one should avoid the use of generators which is de- 
signed for cryptographic use. 

• There are no real need to use generators with periods > lO'"". 

From all of this we should learn that: An acceptable random gen- 
erator must combine at least two (ideally and unrelated) methods. 
The methods combined should evolve independently and share no 
state. The combination should be by simple operations that do not 
produce results less random than their operands, (Press et al. 2007, 
p. 342). 

If we take a look at the random numbers that are produced by 
the random number generator rand which are the standard one in 
C-I-+ we find data similar to these ones, see figure (1). If we now 
compare figure ( 1 ) with the one showing the same amount of ran- 
dom sampled points using the random sampler build from the prin- 
ciples from Press et al. (2007), figure (2) we see that there are not 
that much difference in the visual appearance after only 10 000 
sampled points. But there seams to be a little more clustering of 
points in figure (1) 

For this reason we will use the random generator from l ess et al. 
(2007). Another thing we will need is a random sampler that can 
sample a Gaussian distribution, for this purpose we will turn to 
Box-Muller principle. 



2.1 Box-Muller Transformation 

This transformation can be generalised to more than one dimen- 
sion. Say xi, X2, ... are random sampled numbers, with a joint 
probability distribution p{xi, xi, ■ ■ .)dxidx2 . . . and if ji, y2, ... 
are functions of the x's (one y per x). Then we can describe the joint 
probability distribution of the y's as: 

d(Xi, X2, ...) 



P(yu y2, ■■■)dyidy2 



p(xu X2, ...) 



d{yi, yi, ■■■) 



dyidy2 ...(1) 



where |fl(xi, X2, ...)/d(yi, y2, ■ ■ ■)] is the Jacobian determinant 
of the x's with respect to the v's. This principle can be used to 
transform our uniformly distributed random samples into random 
samples that are normal (Gaussian) distributed. This distribution is 
described by: 



p(y)dy 



V2^ 



exp(-y^/2)dy 



(2) 



We can now consider the transformation between two uniform de- 
viates on (0,1), X|, X2, and two quantities ^^i, y2 as: 



>'i = -^-2 In xi cos 27rx2 
j2 = V~21nxi sin 27rx2 



We can now isolate xi and X2 by doing this we find that: 
) 



Xl 



exp 



1 y2 
— arctan — 
2n vi 



(3) 
(4) 



(5) 
(6) 



Now it is fairly straight forward to calculate the Jacobian determi- 
nant an it turns out we get: 

d(x,,X2) 

d(yi,y2) 











dx2 


3x2 


dyi 





exp(-)'?/2) 



V2^ 



exp(-)'2/2) 



From this we see that both yi and y2 are distributed according to the 
normal distribution given in equation (2). The result of out normal 
transformation can be seen in figure (3). In order to convince the 
suspicious reader that we in fact have created a normal distributed 
sample we can create the histogram of the data. This can be seen on 
figure (4). From figure (4) it is quite obvious that we have created a 
normal distributed sample since it is nicely Gaussian. 



3 THE DATA 

For the rest of the project we have chosen to work on a set of data, 
that are made up, but could be data from any part of the physical 
domain. The data set is presented in table (1), and an illustration of 
the data can be seen in figure (5). From this figure it is obvious that 
the data can be described by a straight line given by: 



y '■ 



x + b 



(7) 
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Table 1. The data used in the MCMC analysis 
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Figure 3. Our 10 000 normal distributed points, be aware that the axis have 
changed if one compares to figure ( 1 ) and figure (2). 
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Figure 4. Histogram showing the data from figure (3). 



Since the data can be described by equation (7) we know that we 
can use the hnear least squares (henceforth Isq) methods for fitting 
the optimal straight line, and thereby test the result of our MCMC 
solution. 

We are planing to do the Isq in two versions both and ordinary non- 
weighted Isq (hence OLS) and a weighted Isq (henceforth WLS). 
For computing the Isq we use the least squares routine developed 
during the course. 

Another thing one should notice about the data is that they are 
weighted, see figure (5). We also notice that not all the data points 
are weighted equally, the reason for this is to try and show that 
there is also a significant difference between the results of OLS, 
and WLS. 
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Figure 5. The data chosen for this test of MCMC. We have chosen to weight 
the data differently in order to see if this have an effect or not. 



linear set of data. Now would be a good time to fit a line to data so 
we have some idea about the parameters. 

4.1 Ordinary Linear Leas Squares Fitting 

We will start with the OLS. The OLS routine is included in the 
Isq.cpp program which is simply a modified version of what we 
have already developed in the linear least squares exercise. The 
modification is simply that we have removed the weighting of the 
individual data points. The OLS solution is m = 2.19 and h = 
32.00. The result of this fit can be seen in figure (6). This seam at 
first sight to be a rather good fit, but since we neglected the weights 
of the individual data points. This is not the optimal fit. Had the 
data been equally weighted e.g. if we did not have any information 
about the weighting of individual data points. 



4 LEAST SQUARES FITTING 

The point of Isq is to determine in and b of equation (7). This could 
be done for any arbitrary set of data, not only data which falls on 
a straight line. The only requirement for our routine is that it is a 



4.2 Weiglited Linear Least Squares Fitting 

The other approach of the is the WLS is a solution to the data. Since 
not all the data points are not equally weighted. The WLS takes the 
individual weights into account. The WLS fit to the data gives us 
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Figure 6. Here is the non-weighted OLS fit, where tti — 2.19 and b — 32.00 Figure 8. comparison between the OLS and the WLS lines 
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Figure 7. Weighted LSQ fit. with m = 2.23 and h = 34.80, to the data set. 



the following values for m = 2.23 and b = 34.80. From table (2) 
we can see that we get somewhat higher values of m and b than that 
of the OLS. If we plot the result of the WLS, we get the following 
result, see figure (7). 

When comparing the two plots figure (6) and figure (7) it is not 
obvious that there is any difference at all. But if the two results 
are plotted in the same plot we see a clear difference see figure 
(8). We can now see that our expectation that the WLS is a better 
solution of the line that describe the data, than that of OLS. Since 
we now have the optimal solution for the straight line we can now 
start our statistical modelling of the data. For this we are going to 
use MCMC. 



Table 2. This table shows the results of the Isq fitting to the data for the two 
methods of Isq used here. 



5 BAYES' THEOREM AND THE LINK TO MCMC 

In statistics there have been two branches, competing about which 
one was the correct one to use. Scientists have been changing 
from one to the other and back again as time has gone. The two 
branches are known as "frequentists" and "Bayesians" . Nowadays 
the Bayesian statistics are the dominant branch. It is also from this 
branch that the MCMC have rose. 

Bayesian statistics generally builds on the theory of Rev. Thomas 
Bayes (1702 - 1761). The reason for his importance is that he for- 
mulated the fundamental theorem now known as Bayes' theorem. 
Bayes theorem generally shows the relation between one condi- 
tional probability and its inverse, e.g. the probability of a hypoth- 
esis given observed evidence and the probability of that evidence 
given the hypothesis. In other words; Data changes probability. 
In a little more strict formulation Bayes' theorem states: 

The posterior probability (i.e. after evidence D is observed) of a hypothesis 
H in terms of the prior probabilities of H and D. and the probability of D 
given H. It implies that evidence has a stronger confirming effect if it was 
more unlikely before being observed. 

If we want to write Bayes' theorem as an equation we can do it in 
the following manor. 

Bayes theorem relates the conditional and marginal probabilities of 
the events A and B, where B has a non vanishing probability: 

p{B\A)p(A) 



p(A|B) : 



P(B) 



(8) 



In this equation p(A) is the marginal probability or prior probabil- 
ity (henceforth prior). It is a prior in the sense that it does not take 
any information about B into account. /;i(A|B) in the conditional 
probability of A, given B. this also called the posterior probability 
(hence posterior) since it depends on the specific value of B. p(B\A) 
is the conditional probability of B given A, this is also called like- 
lihood which will be the term used hence forth. The equation (8) is 
normalized by the prior of B. 
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6 SAMPLER 

In order to make our MCMC work we need to have a control if 
a new sampled point should be accepted or not. There are sev- 
eral ways of doing that one of these ways are known as the 
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Metropolis-Hastings algorithm, Hastings (1970), and the Gibbs 
sampler, Chib & Greenberg (1995); Chib & Jeliazkov (2001) . We 
will go through the theory of both of them, but in our MCMC we 
will use the Metropolis-Hastings algorithm. 



6.1 The Gibbs Sampler 

The Gibbs sampler is an algorithm used for generating a sequence 
of samples from the joint probability distribution of two or more 
random variables, see equation (1). The purpose a sequence like 
this is to approximate the joint distribution, or compute an inte- 
gral, e.g. an expectation value. In principle the Gibbs sampling is 
a special case of the of the Metropolis-Hastings algorithm. Since 
we are not going to apply the Gibbs sampling we will not go into 
further detail with here, more information on the Gibbs sampler 
can be found in Geyer (1992); Press et al. (2007); Chib & Jeliazkov 
(2001). 



6.2 The Metropolis-Hastings Algorithm 

The Metropolis-Hastings algorithm is the heart of our MCMC pro- 
gram. Since it is a method for obtaining a sequence of random num- 
bers from a probability distribution for which direct sampling is 
difficult, or impossible. The sampled sequence can then be used for 
approximating the distribution, e.g. by producing a histogram. The 
algorithm can be presented in the following way: 
Unless a transition probability functionp(jt:2|j:i) that satisfy the de- 
tailed balance equation 



(9) 



There is no possible way of continuing. That is when we need the 
Metropolis-Hastings algorithm. Which say that we should pick a 
proposal distribution q{x2\xi). This distribution can be what ever 
you like, as long at the succession of the steps generated reach 
everywhere in the region of interest. One example could be that 
^(jC2|JCi) might be a multivariate normal distribution centred around 
x\. 

Now in order to generate a step starting at Xi, we first generate a 
candidate point X2c by drawing from the proposal distribution. The 
second step is to calculate the acceptance probability a{x i , xic) this 
can be done using the following equation 



a(x\,X2c) = min 1 



7T{X2c)q{Xi\X2c) 



(10) 



n(xi)q(x2c\xi) , 

The final and third step is with the probability a(xi,X2) to accept 
the candidate point and set X2 = X2c- if this is not possible the point 
should be rejected and hereby leave the point unchanged meaning 
X2 = Xi. The netto result of this process is a transition probabil- 
ity. Press et al. (2007); Hastings (1970); Chib & Greenberg (1995); 
Chib & Jeliazkov (2001). 

PiX2\Xl) ^ q{X2\Xi)a(Xi,X2) (X2*Xi) (11) 



6.3 Results of our Metropolis-Hastings Algorithm 

In figure (9) we can see the sampled data plotted in two dimen- 
sions. This mean that we now get a direct impression of our pa- 
rameter space. Since the density of the dots in figure (9) is an ex- 
pression of the number of occurrences. So just from this simple 
plot we can already estimate that the parameter m should be around 
m e [2.0; 2.4] and b is then from this visual inspection determined 
to be ii e [0;70]. Of cause these are very rough estimates. They 
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Figure 9. The sampled data points used to make our description of our 
parameter space. 



still tells us that both the OLS and the WLS seam to be fairly good 
estimates. Let us move on to the MCMC simulation. 



7 THE MARKOV-CHAIN MONTE-CARLO 

Let us begin this section by breaking down the term MCMC into 
the points of which it is build. So MCMC becomes MC — MC and 
we will then look at what a Markov-Chain is: 



7.1 Markov-Chain 

The Markov-Chain is a discrete random process with the property 
that the next state depends only in the current state. It is named 
after Andrei Andreyevich Markov (1856 - 1922). The Markov- 
Chain is a mathematical tool used for statistical modelling, it can 
be thought of as a frog jumping among several lily-pads, where the 
frog's memory is short enough that it does not recall what lily-pad 
it was last on. and therefore the next jump will only be influenced 
by where it is now. 

A little formally we can state what the what the Markov-Chain is 
all about like this: 

A Markov-Chain is a discrete random process with Markov prop- 
erty that goes on forever. A discrete random process is one where 
a system is in a certain state at each step, with the state changing 
randomly with the steps. The steps are often thought of as time, 
but they can as well refer to physical distance or any other discrete 
measurement. The steps are just the natural numbers, and the the 
random process is a sort of mapping of these to states of the sys- 
tem. The Markov property states that the conditional probability 
distribution for the system, at the next step given the current state 
depends only on the current state of the system, and not on the state 
of the system at any previous steps. 



Pix„+ ,\x,,X2,...,x„) = p{x„+ ,\x„) 



(12) 



Since the system changes randomly, it is generally impossible to 
predict the exact state of the system in the future. However, the sta- 
tistical properties of the system's future can be predicted. In many 
application it is these statistical properties that are important, Sagan 
(1981); Geyer (1992); Charlin et al. (2000). 
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7.2 Monte-Carlo 

Since the 1940's when scientists at Los Alamos Laboratories in 
the US invented the Monte-Carlo method (hence MC). Since then 
MC has been widely used in computational physics. Mainly due to 
it's computational friendliness, Metropolis & Ulam (1949). MC or 
MC integration, as it is often called is a method used for obtaining 
knowledge about systems which are beyond the reach of analyti- 
cal methods Metropolis & Ulam (1949). Suppose that we randomly 
pick A' uniformly distributed points, in a multidimensional volume 
V. These points will be referred to as xo, xi, . . . , xjv-i. Then we 
apply the underlying theorem of MC integration. 

J fdV « V{f) ± V .^|lZ!OZ>! (13) 

This theorem seen in equation (13) estimates the integral of the 
function / over the multidimensional volume. The angle brackets 
in equation (13) is simply the arithmetic mean offer the N sample 
points, 

</).-;^/(x,) </>-^Z^'(^'> (14) 

;=0 1=0 

One should notice the plus-minus sign in equation (13), this is only 
an one standard deviation error estimate for the integral, not a rig- 
orous bound. 

The MC approach have been used very extensively in the past, and 
through this approach we have learnt a lot, and it has proved it- 
self, as a robust way of determine integrals numerically. So if this 
method is so good an robust why do we then need a new one? 
There are several reasons, first of all most MC simulations uses 
only a pseudo-random sampling. Then depending on the random 
sampler one uses it will be determined whether or not it is a good 
estimate. Since it is dependant on the kind of random sampler one 
uses for the MC. 

There is no single Monte-Carlo method; instead the therm MC de- 
scribes a large and widely-used class of approaches. However, these 
approaches tend to follow a particular pattern: 

(i) one has to define a domain of possible inputs.. 

(ii) then it is time to generate some inputs randomly from the 
domain using a certain specified probability distribution. 

(iii) one needs to perform a deterministic computation using the 
inputs. 

(iv) one needs to combine the results of the individual computa- 
tions into the final result. 

We will not go into any further detail with the underlying theory 
of the Monte-Carlo method, since this has been the subject of one 
of the exercises in the course. In the mid 1980's a new approach 
to MC was presented to the world this was known as MCMC, the 
main difference between MC and MCMC is that MC is based on a 
random sampling while MCMC builds on the principle of random 
walk. 

7.3 Markov-Chain Monte-Carlo 

Like the MC integration, MCMC is a random sampling method, but 
unlike the MC integration the goal of the MCMC is not to sample 
a multidimensional region uniformly. The goal is rather to visit a 
point X with a probability proportional to a given distribution func- 
tion n(x). The distribution n(x) is not quite a probability, because it 
might not be normalized to unity when integrated over the sampled 
region. However it is proportional to the probability. 



The reason for applying this is that MCMC provide a powerful 
way of estimating parameters of a model and their degree of un- 
certainty. The typical case is that we have a set of data D, and we 
are able to calculate the probability of the dataset given the values 
of the parameters of the model x, which yield P(D\x). if we as- 
sume the prior P(x) Bayes' theorem, equation (8), will tell us that 
n(x) = P(D\x)P(x). In the form of Bayes' theorem given in equa- 
tion (8), the normalization is known, but it might be unknown. If 
we assume that the normilaztion is unknown,then n(x) is will not 
be the normalized probability density. But if we can sample from 
it, we can achieve information about any quantity of interest. This 
is interesting since we can use this fact to obtain information on 
the normalized probability density simply by observing how often 
we sample a given volume dx. Another maybe even more useful 
property is that we can observe the distribution of any single com- 
ponent or set of components. So one of the questions we should ask 
is; "Why is there at all a need to use MCMC instead of MC?" This 
is an interesting question, but it is fairly easy to answer, one of the 
major advantages of MCMC is that it automatically puts the sam- 
ples points preferentially where n(x) is large. In a high-dimensional 
space or where n(x) is expensive to compute, this can be advanta- 
geous by several orders of magnitude. 

Our next step is to assemble both parts of our Markov-Chain us- 
ing the Metropolis-Hastings algorithm and our Monte-Carlo algo- 
rithm in order to get a fully working Markov-Chain Monte-Carlo 
Charlinetal. (2000); Geyer (1992); Sagan (1981); Press et al. 
(2007); Cappe & Robert (2000). 



8 RESULTS 

We now need to extract some results from our simulation. This is 
done using the sampled data, from these we can create a histogram 
in either direction, of the parameter space. So we will in the end 
have a histogram for the parameter m and one for the parameter b. 
The way we go about making these histogram is by using the rou- 
tine gsl-histogram from the GNU scientific library. This routine 
is very simple to use since it is just passed on the commandline as: 
$ ) cat file.dat | gsl-histogram x^in x^^x «bins 
we can pipe the output directly into graph or to a new data file for 
plotting with gnuplot. Even for rather large data sets as the one we 
presents with 100 000 data points we still get our result from the gsl 
routine almost instantly. This ensures a high efficiency. However it 
also presents a problem. Since the histogram data are created by 
running the makefile as 
$ ) make histogram 

we need to compile the makefile several times to get it all done. 
Therefore I have chosen to make a shell-script to ensure that every- 
thing is done in the right order this script is called domake . sh. Let 
us have a look at the results. 

8.1 Estimating m 

We now have our sampled data from our MCMC and therefore want 
to determine m. We can estimate m by making a histogram along 
the m-axis of our sampling, see figure (9). This will result in the 
histogram showed in figure (10). In order to determine where the 
max is we can fit a Gaussian to this histogram. This can be done 
in two ways. Either by simple algebraic solving the equation of the 
Gaussian for the values provided by our sampling. This is the line 
with the long dashes in figure (10). The short dashed line indicate 
a Gaussian fit by the MCMC values. 
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Figure 10. Estimation of the parameter m. 



Figure 12. Estimation of the parameter b 
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Figure 11. Zoom in on the upper half of figure (10). Estimation of the 
parameter m. 
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Figure 13. Zoom in on the upper half of figure (12). Estimation of the pa- 
rameter b. 



These lines are not easy to separate from each other, which is why 
in figure (11) have a zoom of the upper part of figure (10). From 
this zoom one can as well see that the maxima's of the two lines 
are not exactly above each other. But they are shifted by a very tiny 
amount. 

From the estimation we get the results shown in table (3). And we 
see that the determination of the parameter is far better regarding 
the uncertainties associated since we here are looking at uncertain- 
ties of ±5% in m, for the algebraic fit and the MCMC fit, while we 
for the WLS fit have an uncertainty of ±15%. 
Another thing we should take notice of is that fact that with the 
MCMC we approximates the final result of the WLS far better than 
that of the algebraic solution even though it is only .02 percent that 
separates them. 



histogram showed in figure (12). In order to determine where the 
max is we can fit a Gaussian to this histogram. This can be done 
in two ways. Either by simple algebraic solving the equation of the 
Gaussian for the values provided by our sampling. This is the line 
with the long dashes in figure (12). The short dashed line indicate 
a Gaussian fit by the MCMC values. 

These lines are not easy to separate from each other, which is why 
in figure (13) have a zoom of the upper part of figure (12). From 
this zoom one can as well see that the maxima's of the two lines 
are not exactly above each other. But they are shifted by a very tiny 
amount. 

From the estimation we get the results shown in table (3). Again 
we see how the parameters fits the sampled data very well and we 
also see a nice correspondence between WLS and MCMC fits. 



8.2 Estimating b 

We now have our sampled data from our MCMC and therefore want 
to determine b. We can estimate b by making a histogram along 
the i-axis of our sampling, see figure (9). This will result in the 



9 PERSPECTIVES AND APPLICATIONS OF MCMC 

In this section we will briefly discuss some of the issues for which 
we can apply MCMC in the field of astrophysics. Within astro- 
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Table 3. Table showing the results of the algebraic solution versus the MCMC and the Isq fits. 



OLS 


WLS 


Algebra 


MCMC 


m 2.1910 
b 32.0040 


2.2325 ± 0.3309 
34.8495 ± 55.2679 


2.2321 ± 0.1087 
34.9215 ± 18.1203 


2.2324 ± 0.1103 
34.824 ± 18.4226 



physics the MCMC technique have become very popular over re- 
sent years. Especially in fields as cosmology and extra-solar planet 
hunting. So what do the scientists use this power full tool for? 



9.1 Extra-solar planet hunting 

In the field of Extra-solar planet (hence exoplanet) hunting MCMC 
is widely used to determine the planet-star -radius ratio, inclination 
angle, etc, see Gibson et al. (2009, 2010b) for further details of the 
exact method used. The systemic inclination is of severe impor- 
tance an it is therefore necessary to get a very good and robust de- 
termination of this value. Even though we know it to be 90° ±5- 10° 
since we would not see the transit when the planet passes in front 
of the star otherwise. Likewise it is also favourable that one gets 
very robust uncertainties with the MCMC approach. 
In general it is possible to derive a lot of different parameters which 
are of interest in the exoplanet domain by the method of MCMC, 
see Hrudkova et al. (2010) whom uses MCMC to derive all sys- 
temic parameters and their uncertainties. Other references might 
be Gibson et al. (2010a); Watson et al. (2010) 
Another very interesting thing in the astronomy domain is a 
new website called astrometry.net. This website can calibrate any 
picture of the night sky simply by using a MCMC algorithm 
Lang et al. (20 1 ). So if you are wondering what you are looking at 
when looking upon the night sky you can simply take a photo with 
your cell phone and upload it to the website and you will instantly 
get information on that region of the sky. 



9.2 MCMC and cosmology 

Another field where MCMC have been applied is cosmology this 
has basically been done by the same research team who are respon- 
sible for the astrometry.net site. They have undertaken an enor- 
mous task by starting to look at the Hipparcos catalogue and try 
to classify the stars in the sample by their velocity distribution for 
this they have used MCMC see Bovy & Hogg (2010); Bovy et al. 
(2009, 2010). 



10 CONCLUSION 

During this paper we have gone through some of the basic theory 
of MCMC and we have tested the principle of MCMC on a rather 
simple dataset, and see that this is indeed a robust and powerful 
tool when working with observed data. We can conclude that the 
MCMC method is just as good as the standard WLS, the major 
advantage of the MCMC compared to the WLS is that the uncer- 
tainties is a factor three better than those of the WLS. Some of the 
disadvantages is however that the MCMC is rather complex, and 
depending on your parameter space it might not be straight forward. 
Other methods we could have compared could be optimisation by 
the simplex method. We have chosen not to do that. 
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